280 research outputs found

    Global permutation tests for multivariate ordinal data: alternatives, test statistics, and the null dilemma

    Get PDF
    We discuss two-sample global permutation tests for sets of multivariate ordinal data in possibly high-dimensional setups, motivated by the analysis of data collected by means of the World Health Organisation's International Classification of Functioning, Disability and Health. The tests do not require any modelling of the multivariate dependence structure. Specifically, we consider testing for marginal inhomogeneity and direction-independent marginal order. Max-T test statistics are known to lead to good power against alternatives with few strong individual effects. We propose test statistics that can be seen as their counterparts for alternatives with many weak individual effects. Permutation tests are valid only if the two multivariate distributions are identical under the null hypothesis. By means of simulations, we examine the practical impact of violations of this exchangeability condition. Our simulations suggest that theoretically invalid permutation tests can still be 'practically valid'. In particular, they suggest that the degree of the permutation procedure's failure may be considered as a function of the difference in group-specific covariance matrices, the proportion between group sizes, the number of variables in the set, the test statistic used, and the number of levels per variable

    A microRNA molecular modeling extension for prediction of colorectal cancer treatment

    Get PDF
    Background: Several studies show that the regulatory impact of microRNAs (miRNAs) is an essential contribution to the pathogenesis of colorectal cancer (CRC). The expression levels of diverse miRNAs are associated with specific clinical diagnoses and prognoses of CRC. However, this association reveals very little actionable information with regard to how or whether to treat a CRC patient. To address this problem, we use miRNA expression data along with other molecular information to predict individual response of CRC cell lines and CRC patients. Methods: A strategy has been developed to join four types of information: molecular, kinetic, genetic and treatment data for prediction of individual treatment response of CRC. Results: Information on miRNA regulation, including miRNA target regulation and transcriptional regulation of miRNA, in integrated into an in silico molecular model for colon cancer. This molecular model is applied to study responses of seven CRC cell lines from NCI-60 to ten agents targeting signaling pathways. Predictive results of models without and with implemented miRNA information are compared and advantages are shown for the extended model. Finally, the extended model was applied to the data of 22 CRC patients to predict response to treatments of sirolimus and LY294002. The in silico results can also replicate the oncogenic and tumor suppression roles of miRNA on the therapeutic response as reported in the literature. Conclusions: In summary, the results reveal that detailed molecular events can be combined with individual genetic data, including gene/miRNA expression data, to enhance in silico prediction of therapeutic response of individual CRC tumors. The study demonstrates that miRNA information can be applied as actionable information regarding individual therapeutic response

    Validating the knowledge bank approach for personalized prediction of survival in acute myeloid leukemia: a reproducibility study

    Get PDF
    Reproducibility is not only essential for the integrity of scientific research but is also a prerequisite for model validation and refinement for the future application of predictive algorithms. However, reproducible research is becoming increasingly challenging, particularly in high-dimensional genomic data analyses with complex statistical or algorithmic techniques. Given that there are no mandatory requirements in most biomedical and statistical journals to provide the original data, analytical source code, or other relevant materials for publication, accessibility to these supplements naturally suggests a greater credibility of the published work. In this study, we performed a reproducibility assessment of the notable paper by Gerstung et al. (Nat Genet 49:332–340, 2017) by rerunning the analysis using their original code and data, which are publicly accessible. Despite an open science setting, it was challenging to reproduce the entire research project; reasons included: incomplete data and documentation, suboptimal code readability, coding errors, limited portability of intensive computing performed on a specific platform, and an R computing environment that could no longer be re-established. We learn that the availability of code and data does not guarantee transparency and reproducibility of a study; paradoxically, the source code is still liable to error and obsolescence, essentially due to methodological and computational complexity, a lack of reproducibility checking at submission, and updates for software and operating environment. The complex code may also hide problematic methodological aspects of the proposed research. Building on the experience gained, we discuss the best programming and software engineering practices that could have been employed to improve reproducibility, and propose practical criteria for the conduct and reporting of reproducibility studies for future researchers. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00439-022-02455-8

    Bayesian model selection techniques as decision support for shaping a statistical analysis plan of a clinical trial: An example from a vertigo phase III study with longitudinal count data as primary endpoint

    Get PDF
    Background: A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. Methods: We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over-and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform(PIT), and by using proper scoring rules (e.g. the logarithmic score). Results: The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. Conclusions: The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint

    Estimating individual treatment effects from responses and a predictive biomarker in a parallel group RCT

    Get PDF
    When being interested in administering the best of two treatments to an individual patient i, it is necessary to know the individual treatment effects (ITEs) of the considered subjects and the correlation between the possible responses (PRs) for two treatments. When data are generated in a parallel–group design RCT, it is not possible to determine the ITE for a single subject since we only observe two samples from the marginal distributions of these PRs and not the corresponding joint distribution due to the ’Fundamental Problem of Causal Inference’ [Holland, 1986, p. 947]. In this article, we present a counterfactual approach for estimating the joint distribution of two normally distributed responses to two treatments. This joint distribution can be estimated by assuming a normal joint distribution for the PRs and by using a normally distributed baseline biomarker which is defined to be functionally related to the sum of the ITE components. Such a functional relationship is plausible since a biomarker and the sum encode for the same information in a RCT, namely the variation between subjects. As a result of the interpretation of the biomarker as a proxy for the sum of ITE components, the estimation of the joint distribution is subjected to some constraints. These constraints can be framed in the context of linear regressions with regard to the proportions of variances in the responses explained and with regard to the residual variation. As a consequence, a new light is thrown on the presence of treatment–biomarker interactions. We applied our approach to a classical medical data example on exercise and heart rate

    Statistical process monitoring to improve quality assurance of inpatient care

    Get PDF
    BACKGROUND Statistical Process Monitoring (SPM) is not typically used in traditional quality assurance of inpatient care. While SPM allows a rapid detection of performance deficits, SPM results strongly depend on characteristics of the evaluated process. When using SPM to monitor inpatient care, in particular the hospital risk profile, hospital volume and properties of each monitored performance indicator (e.g. baseline failure probability) influence the results and must be taken into account to ensure a fair process evaluation. Here we study the use of CUSUM charts constructed for a predefined false alarm probability within a single process, i.e. a given hospital and performance indicator. We furthermore assess different monitoring schemes based on the resulting CUSUM chart and their dependence on the process characteristics. METHODS We conduct simulation studies in order to investigate alarm characteristics of the Bernoulli log-likelihood CUSUM chart for crude and risk-adjusted performance indicators, and illustrate CUSUM charts on performance data from the external quality assurance of hospitals in Bavaria, Germany. RESULTS Simulating CUSUM control limits for a false alarm probability allows to control the number of false alarms across different conditions and monitoring schemes. We gained better understanding of the effect of different factors on the alarm rates of CUSUM charts. We propose using simulations to assess the performance of implemented CUSUM charts. CONCLUSIONS The presented results and example demonstrate the application of CUSUM charts for fair performance evaluation of inpatient care. We propose the simulation of CUSUM control limits while taking into account hospital and process characteristics

    Incorporation of Multiple Sources into IT - and Data Protection Concepts: Lessons Learned from the FARKOR Project

    Get PDF
    The IT- and data protection concept of the FAmiliäres Risiko für das KOloRektale Karzinom (FARKOR) project will be presented. FARKOR is a risk adapted screening-project in Bavaria, Germany focusing on young adults with familial colorectal cancer (CRC). For each participant, data from different sources have to be integrated: Treatment records centrally administered by the resident doctors association (KVB), data from health insurance companies (HIC), and patient reported lifestyle data. Patient privacy rights must be observed. Record Linkage is performed by a central independent trust center. Data are decrypted, integrated and analyzed in a secure part of the scientific evaluation center with no connection to the internet (SECSP). The presented concept guarantees participants privacy through different identifiers, separation of responsibilities, data pseudonymization, public-private key encryption of medical data and encrypted data transfer

    State-of-the-Art in Parallel Computing with R

    Get PDF
    R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing. This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Two packages (snow, Rmpi) stand out as particularly useful for general use on computer clusters. Packages for grid computing are still in development, with only one package currently available to the end user. For multi-core systems four different packages exist, but a number of issues pose challenges to early adopters. The paper concludes with ideas for further developments in high performance computing with R. Example code is available in the appendix

    affyPara—a Bioconductor Package for Parallelized Preprocessing Algorithms of Affymetrix Microarray Data

    Get PDF
    Microarray data repositories as well as large clinical applications of gene expression allow to analyse several hundreds of microarrays at one time. The preprocessing of large amounts of microarrays is still a challenge. The algorithms are limited by the available computer hardware. For example, building classification or prognostic rules from large microarray sets will be very time consuming. Here, preprocessing has to be a part of the cross-validation and resampling strategy which is necessary to estimate the rule’s prediction quality honestly

    Risk of advanced colorectal neoplasia according to age and gender.

    Get PDF
    Colorectal cancer (CRC) is one of the leading causes of cancer related morbidity and death. Despite the fact that the mean age at diagnosis of CRC is lower in men, screening by colonoscopy or fecal occult blood test (FOBT) is initiated at same age in both genders. The prevalence of the common CRC precursor lesion, advanced adenoma, is well documented only in the screening population. The purpose of this study was to assess the risk of advanced adenoma at ages below screening age. We analyzed data from a census of 625,918 outpatient colonoscopies performed in adults in Bavaria between 2006 and 2008. A logistic regression model to determine gender- and age-specific risk of advanced neoplasia was developed. Advanced neoplasia was found in 16,740 women (4.6%) and 22,684 men (8.6%). Male sex was associated with an overall increased risk of advanced neoplasia (odds ratio 1.95; 95% confidence interval, CI, 1.91 to 2.00). At any age and in any indication group, more colonoscopies were needed in women than in men to detect advanced adenoma or cancer. At age 75 14.8 (95% CI, 14.4-15.2) screening, 18.2 (95% CI, 17.7-18.7) diagnostic, and 7.9 (95% CI, 7.6-8.2) colonoscopies to follow up on a positive FOBT (FOBT colonoscopies) were needed to find advanced adenoma in women. At age 50 39.0 (95% CI, 38.0-40.0) diagnostic, and 16.3 (95% CI, 15.7-16.9) FOBT colonoscopies were needed. Comparable numbers were reached 20 and 10 years earlier in men than in women, respectively. At any age and independent of the indication for colonoscopy, men are at higher risk of having advanced neoplasia diagnosed upon colonoscopy than women. This suggests that starting screening earlier in life in men than in women might result in a relevant increase in the detection of asymptomatic preneoplastic and neoplastic colonic lesions
    corecore